The FreeBSD Diary |
(TM) | I remember |
ZFS: do not give it all your HDD
1 August 2010
|
I'm about to rebuild my ZFS array (which I documented in my other diary). The array has been running for a while, but I recently learned some new facts about ZFS which spurred me on to rebuilding my array with future-proofing in mind. This is my plan for tonight. As I type this, Jerry is over tonight, doing the heavy lifting for me. I am nursing a broken left elbow. The two new HDD have been installed and the system has been powered back up. Tonight we will do the following:
This article originally covered all of the above steps. That soon led to a multi-day 3000 line document. I thought it best to break it it up into a few smaller articles. To this end, this article will cover only the part about partitioning HDD so as to avoid future problems at preplacement time. |
Don't use all your HDD
|
In this section, let's assume you are building a new ZFS array. I will talk about how I like to partition my HDD with a little buffer zone, and why. Let's assumed ada0 and ada6 are the drives you want to use. This is the list of ada devices from dmesg: ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada7: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C) ada8: 152587MB (312500000 512 byte sectors: 16H 63S/T 16383C) As you can see, each of these devices is contains 3907029168 sectors, each containing 512 bytes. For a total of 1863 GB, on a 2TB HDD. However, not all 2TB HDD contain this same space. Even the same model of HDD can vary. If you are using ZFS, you should be aware of the following from man zpool: zpool replace [-f] pool old_device [new_device] Replaces old_device with new_device. This is equivalent to attach- ing new_device, waiting for it to resilver, and then detaching old_device. The size of new_device must be greater than or equal to the minimum size of all the devices in a mirror or raidz configuration. Thus, if your replacement HDD is just 1 sector smaller than the original, you cannot use it. But there is a cunning plan. Partition the HDD and give only the partition to ZFS. Now, this isn't useful to you in hindsight if your array is broken now. This strategy is only useful when setting up a new array. The idea is to use slightly less than your entire HDD. Thus, if a replacement HDD happens to be smaller, you're covered. |
Using gpart
|
There are another approaches to this, but I'm using gpart. # gpart create -s GPT ad1 gpart: provider 'ad1': Invalid argument Oh. Yes, wrong name. Let's try this: # gpart create -s GPT ada0 # Now let's see what we have: # gpart show ada0 => 34 3907029101 ada1 GPT (1.8T) 34 3907029101 - free - (1.8T) # From the above, we can see one partition of 3907029101 sectors, starting at sector 34. Each sector is 512 bytes as can be seen here (in bold): # camcontrol identify ada0 pass0: <Hitachi HDS722020ALA330 JKAOA28A> ATA-8 SATA 2.x device pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) protocol ATA/ATAPI-8 SATA 2.x device model Hitachi HDS722020ALA330 firmware revision JKAOA28A serial number JK1131YAHLJWLV WWN 5000cca221d68596 cylinders 16383 heads 16 sectors/track 63 sector size logical 512, physical 512, offset 0 LBA supported 268435455 sectors LBA48 supported 3907029168 sectors PIO supported PIO4 DMA supported WDMA2 UDMA6 media RPM 7200 Feature Support Enable Value Vendor read ahead yes yes write cache yes yes flush cache yes yes overlap no Tagged Command Queuing (TCQ) no no Native Command Queuing (NCQ) yes 32 tags SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management yes no 0/0x00 automatic acoustic management yes no 254/0xFE 128/0x80 media status notification no no power-up in Standby yes no write-read-verify no no 0/0x0 unload no no free-fall no no data set management (TRIM) no I plan to leave 200MB free at the end of each HDD. Thus, the gpart commend to add a new partition is: gpart add -b 2048 -s 3906824301 -t freebsd-zfs -l disk00 ada0 Please note that the above math is incorrect, but only slightly. It leaves some 99MB free, which is completely acceptable for this effort. The correct math is: gpart add -b 2048 -s 3906617453 -t freebsd-zfs -l disk00 ada0where:
|
Creating the pool
|
Let's assume we did the above with 5HDD. the command to create the new pool is: # zpool create -f storage raidz2 gpt/disk00 gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 # zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2 ONLINE 0 0 0 gpt/disk00 ONLINE 0 0 0 gpt/disk01 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 errors: No known data errors |
There, done.
|
When it comes time to replace one of the above devices, let's say gpt/disk02, you do this: # zpool offline storage gpt/disk02 Then you remove that HDD from the system, and insert the new HDD. You partition the new HDD just like you did above, adjusting the math, and you instantly have a new partition exactly the same size as all the others. Now add that disk back in: # zpool replace storage gpt/disk02 Done. Let the array resilver, and you're good to go. Hopefully, this approach will save us both from headaches in the future. |